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Abstract 



Using a quasi-experimental ANOVA design, this project examined the effects of the use of 

students, and non-LEP students and whether the use of accommodations 
affected the validity of test score mterpretations. Major accommodations examined were extra time 
and exfra Ume with extended oral presentation. Samples of and 7 ^ grade students were tested 
using the Terranova multiple assessment math test, as well as a math skills test and the LAS reading 
comprehension test. Descriptive findings showed that LEP students scored lower than non-LEP 
students on math tests and teacher reported skill levels. Major predictors of math achievement were 
LAS reading proficiency level (a proxy of LEP status), whether students received an 
accommodation, and teacher rating of reading skill. ANOVA analyses were conduaed to comnare 
the mean scores of students m accommodated tests vs. those with no accommodation. These 
showed the accommodation effect was significant, with those students in the extra time condition 
shoving the highest scores. A discriminant analysis showed that the best predictors of membershin 
in different English fluency groups were Spanish fluency, time in US, reading grade and math test 
score. LEP students were more likely than non-LEP students to be misclassified into a fluency 
p-oup. Examination of students’ writing samples showed clear differences in mathematics and 
language achievement, depending on the student’s language proficiency level. The project’s results 

provide information on whether accommodations provide valid inferences for special needs children, 
as well as non-LEP children. 



3 



ERIC 






Ackno wledgm ents 



The author would like to express appreciation to the Delaware Dept, of Education State of 
Delaw^e for funds received. Funds were initially received from the US. Department of Educatioa 
These funds enabled me to carry out the study. Publication of this report and the views expressed 

onhe'^Stlte o^f Ddaw^^^^ endorsement of the views by the U.S Department of Education 

I would particularly like to thank Nancy Maihoff, from the state of Delaware who has been so 
supportive throughout the process and Liru Zhang, who was very helpful. 



In addition, the research would not have been possible without the guidance of a number of 
individuals on the California State University, Los Angeles team. They include the following- 

I really appreciate the assistance give me by Kyoko Ito at CTB/McGraw Hill. I could not have done 
It without her gracious help. 



Jamal Abedi, UCLA 

Fery Hejri, ARDAC 

Sharon Ulanoff, CSU San Marcos 

Kyoko Ito, CTB/McGraw Hill 

Cheryl Gilera, CSU Los Angeles 

Terry Ray, CSU Los Angeles 

Tracy Lee, CSU Los Angeles 

Curt Meams, Albuquerque Public Schools 

Participating Teachers and Coordinators 



Evaluating the Impact of Assessment Accommodations 

on Test Scores of 

LEP Students and Non-LEP Students 

Table of Contents 



Acknowledgments 



I. Introduction and Objectives 

II. Review of the Literature 



III. Methods 

A. Sampling and subjects 

B. Independent and dependent variables 

C. Design and procedures 

D. Data analysis procedures 



rV. Findings 

A. Descriptive findings 

1. Demographics and educational status 

2. Correlations 

3. Accommodation groups and random assignment 

B. Outcome Data 

1. Effects of accommodations: 4* grade students 

2. Regression analyses: predictors of math achievement 

3. Discriminant analysis: predictors of group membership 

4. Other analyses 



V. Conclusions and Recommendations 



References 



I Introduction and Objectives 

Appropriate inclusion of English learners (called LEP students in this proposal) and students with 
disabilities in large scale performance assessments is no small challenge, but the potential benefits 
are great. How can appropriate inclusion of such students in assessment programs contribute to 
the improvement of education outcomes students under reform? Cooley (1991) points out some of 
the possible uses of such assessments: they can inform policy, they can reform the curriculum and 
can increase accountability. However, there is no pat answer to the question of how best to 
‘appropriately’ include LEP students and students with disabilities. 
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The educational reform initiatives under GOALS2000 and Improving America’s Schools Act 
(lASA) call for assessment innovations in support of high standards to raise the achievement of all 
students including LEP students. NAEP has made strides in recent years in addressing students 
with disabilities and LEP student needs, including developing a side by side bilingual test, and 

allowing various accommodations. In addition, IDEA legislation mandates that states test students 
with disabilities. 

In another recent development, federal legislation in 1997 regarding development of a Voluntary 
National Test (VNT) points to the importance of addressing the needs of disabled and LEP 
students, as well as inclusion and accommodation issues. Several such developments have 
converged to focus increased attention on the issue of including LEP students and students with 
disabilitiesand tracking these students’ achievement and progress accurately. It is critical for state 
departments of education as well as local districts to be able to accurately assess and monitor the 
academic progress of all students with their testing programs. 

The role of statewide assessment programs takes on increasing importance under education reform, 
as statewide tests become one of the primary measures of attainment of student performance 
standards. Educators are now looking to find ways to give LEP students and students with 
disabilities access to the full grade level appropriate curriculum and to carry out assessments that 
give these students the opportunity to show what they know and can do. 

Although some agree that the move toward increased use of performance assessments may offer 
students a fairer and more contextualized method of ascertaining what they know and can do, others 
point out that new questions of validity arise. As assessment becomes increasingly embedded in 
instruction, it becomes more and more important for us to examine the validity of modifying or 
mediating assessments for various subgroups and to develop criteria/principles for the fair and valid 
administration of assessments to all students. 

This project will contribute to the advancement of theory and knowledge in the area of the valid and 
fair assessment of all students. In light of the standards movement at the federal and state levels, the 
question is how can second language learners and students wdth disabilities be fairly held to the 
standards as well as included in assessments as much as possible? This project aims to help answer 
this question. 



As mentioned in recent reports (see AIR, 1998a), very little research has been done on the use of 
accommodations with LEP students. NAEP has conducted some research but sample sizes in the 
1996 administration were too small to evaluate the effects of accommodations on the technical 
characteristics of scores. NAEP did find that including scores for students who received 
accommodations did not have a significant effect on overall scale score results. 

The purpose of this research study was to examine the effects of the use of accommodations with 
LEP students, and non-LEP students and whether the use of accommodations affects the validity of 
test score interpretations. If any of these are found to be “yes”, we need to look at which 
accommodations affect test performance in which ways. 



The study’s research questions are the following: 



What are the effects of using specific accommodations on test scores 
of LEP students, LEP students with disabilities and non-LEP students? 
Do English proficient students benefit equally if allowed the same 
accommodations as LEP students? 

What accommodations provide valid inferences for LEP students 
and LEP students with disabilities? 



n. Review of the I>iteratnri> 



n discussing the assessment of special needs students, one measurement concept that is imnortant 
to consider is gqyiy^engg , which refers to the degree to which test scores can be used to make 
comparable (valid) inferences across diverse groups. A major concern is construct equivalence or is 
a test measmng a construct (such as math knowledge) in a group that is equivalent to the construct 
being tested in other groups. In the case of LEP students, how can we determine whether a test 
measures ^onstruct (math knowledge) only or whether English language proficiency is also being 
assessed? The same question goes for students with disabilities 



In fte field of performance assessment, few studies have focused on validating performance by 
different language groups. In addition, assessment administration has not been a focus of much 
wor in the field, most of the interest has been on task development, scoring and general validation 



At the national jevel, Ae NAEP exams have allowed several types of accommodations on its exams 
for students with disabilities and LEP students, depending on the exam and grade level. These 
mclude language accommodations such as a bilingual test book, bilingual dictionaiy or glossary test 
settmg accommodations such as one-on-one testing; extended time; read aloud or repetition of the 
test instructions; and accommodations for disabled children such as Braille, large print or computer 
equipment accommodations. (AIR, 1998a). ^ 

Many states (about 3/4) allow accommodations on at least one of their statewide assessments. The 
most fi-equently used are extra time (25 states), test setting accommodations (25-29 states)- 
repeating directions (28 states), reading questions aloud (21 states), using word lists or dictionaries 

(14 states, translation of directions (19 states) and use of alternate assessments (11 states).(CCSSO 
fkll 1997) ’ 

Some test modifications and accommodations are unlikely to affect test scores and some are likelv to 
affect scores. The 1985 S tandards note that modifications of tests for individuals with handicapping 
conditions IS, m general, desirable (Committee to Develop Standards for Educational and 
Psycholo^cal Tests, 1985). Much of the research in this area has only been done on individuals 
with disabilities. Little research has been done on the validity of test scores of LEP students vs the 
scores of fully English proficient students. The Standards State there are few data to support 
conclusions about the effects of time modifications on test results (Committee to Develop Standards 
for Educational and Psychological Tests, 1985). A new revision of the Standards is underway 
which provides more guidelines on assessing limited-English proficient students than in the previous 
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The research findings on the effects of giving extra time on essay exams are mixed. Many small or 
non-significant effects have been found, even with large differences in time allocation. Research does 
not prove that relaxing time limits significantly benefits any subgroup of examinees more than 
others, but the major subgroups used have been gender and ethnicity. A recent study (Powers and 
Fowles, 1996) found that additional time was equally beneficial to slow, average and fast test takers 
on an essay test (college students rated themselves as slow, medium or fast). In other words, the 
relative performance of slow, medium and fast test takers did not change much when more time was 
allowed (50% more time). Thus, these researchers found that the meaning of the test scores 
(construct validity) was unrelated to time limits. (Powers and Fowles, 1996, p. 448). Interestingly 
students who said that English was not their best language were less likely to describe themselves as 
being able to write quickly. 



In addition to extra time, other commonly used test accommodations include variations in 
presentation of test stimuli (e.g. simplifying words, reading aloud in English or LI, provision of a 
glossary); variations in response possibilities (oral vs. written response); and small group vs 
individual administration. 

Several accommodations were available for students in the NAEP 1996 math and science tests. In 
preliminary analyses of the data for comparing students tested with accommodations vs. those who 
did not have accommodations, NCES found little evidence of differential item functioning, although 
there were some statistical discrepancies. On the whole, including scores for students with 
disabilities and LEP students who received accommodations did not have a significant effect on the 
overall scale score results. However, these conclusions are said to be preliminary. (Mazzeo et al. 



ni Methods 

This section will describe the methods of the study, including the subjects and sampling, the 
variables to be used, design and procedures, and data analysis procedures. 

A. Sampling and .subjects 

School districts and school sites were recruited and selected, in the state of CA and New Mexico. 
Three groups of 4'*' graders were selected. The first group was made up of students identified as 
LEP either by an English language test or program placement. The second group was made up of 
non-LEP or English proficient students. The third group was made up of LEP students with 
disabilities as identified by an lEP or teacher designation. We originally hoped that a good number 
of LEP students with disabilities could be found and analyzed as a distinct group. Very little 
research has been done on this group. However, in our sample we did not find large numbers of 
these students. Only 22 students in the 4^^ grade were identified as being in a special education 
program. We were thus unable to analyze them separately as a unique group. 

The sample selection process for the 4th grade sample was as follows. For the fourth grade, 4 
schools with 4th grade classrooms were chosen purposively, matched by SES and size. The'schools 
were low SES schools of average to large size, with a good proportion of LEP students. Within a 
school, at least 3 classes (at the 4th grade level) were chosen that contained some LEP students At 
the 4th grade level, assuming about 30 students per classroom, this would make a total of 15 fourth 
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grade classrMms for a sample size of about 450. With attrition and dropout, an effective sample 
size of about 430 ^ expected. As a result of attrition, eleven 4“' grade teachers tested 292 fourth 
graders and seven 7 grade teachers tested 159 students. 



At the 7th grade level, we wanted 15 classes: five schools with three classes each for about 450 
students. With attrition and dropout, an effective sample size of only 160 was obtained. Classes 
sampled included general math courses made up of some English proficient and some LEP students. 

Teachers were recruited fi-om the chosen schools and trained in administering the tests and collecting 
survey data. Stipends were given to them for their work. A school site coordinator at each school 
coordinated the training workshop, materials distribution and mailing and was paid a stipend. 

B. Independent and Dependent Variahle.s 



Accommodations chosen for this study were chosen because (a) many states including Delaware as 
well as distncts report using them; (b) they appear likely to influence some student scores without 
changing the construct tested; and (c) they capture important aspects of test performance and 
comprehension. Accommodations that may influence or change the construct to be measured fsuch 
as reading the test items aloud to students) were not proposed here, as many believe that the use of 

such modifications mdicates something different and usually results in scores being reported 
separately. ® ^ 



Accommodations that have been shown not to change the construct being tested 
study. They include; 



were used in this 



E?dehdgd hmg fET) . Extended time as an accommodation is widely used and has been 
shown to help improve scores on tests for some groups of examinees. LEP students may need 
more tune to translate words or to comprehend the questions asked. Students with disabilities 
may need extra time because of learning disabilities or other disabilities. Based on the results of 
previous studies, the amount of extra time was 50% more time than standard . 



9) E ^ndgd Oral Presgptation (EOP ): Teachers were allowed to simplify test directions re-read 
du-echons, provide additional examples, or read directions in students’ native language The 
activities that are permitted will be listed for the teachers! Extra time was also given with this 
accommodation, as teachers need additional time to do these activities. 



In regard to dependent variables, the CTB/McGraw Hill Terranova math test was given in English 
Students. This test is recommended, as it has known properties, validity, and 
reliability, contain multiple choice and constructed response items. A recent study used Terranova 
as well as a parallel test in Spanish in the 1998 administration (Supera), but very few students were 
^tually getting most of their mstructton in Spanish, and thus only 150 took the test in Spanish 
(Hafner, 1999) Also, the current policy interest seems to be in testing the students in English while 
giving accommo Jtions, rather than providing the test in Spanish. The content area of math was 
chosen, as it is different fi-om language arts, and is less language dependent than other subject areas. 
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Students in 4* and 7* grades were administered a short math basic skills test prior to the 
standardized math test, to obtain a measure of student ability or aptitude in math. They were also 
given the LAS reading test, which can be used either as a covariate or as a predictor of achievement. 

Presentation to all students was made in terms of two half tests that were created using Terranova’s 
Form A. The two half tests are roughly equivalent in terms of difficulty, content domain coverage, 
and both have multiple choice and constructed response items. Scores from each of the half tests 
can be reported on a common standardized scale. Terranova’s multiple assessment includes selected 
response and constructed response items. The two item types can be scaled together. In addition, 
both norm referenced and curriculum-referenced scores (proficiency levels) can be produced. 

C. Design and Procedures 

The design is a quasi-experimental model. In this model, students participated in one form of the 
test accommodations (standard or accommodated). Classes were randomly assigned to one of the 
conditions. Table 1 shows the makeup off classes and student per grade and condition. 

Table 1: Number of classes and students per condition 





4“' Grade 


T" Grade 


Condition 1- Extra Time (ET) 


5(129) 


3 (55) 


Condition 2- Standard Administration 


3(83) 


2(41) 


Condition 3- ET+Extended Oral Presentation 


3 (80) 


2(63) 


Total 


11(292) 


7(159) 



In the fourth grade the conditions were regular time, extra time and extended oral presentation (help 
with instructions). The three groups of students were LEP students, LEP students with disabilities 
and non-LEP students, thus a 3 x 2 factorial ANOVA design (see Table 2 for 4* grade design). For 
the seventh grade, the conditions were regular time, extra time, and extended oral presentation (see 
Table 3 for 7* grade design). Regular students and LEP students were included at 7* grade, for a 3 x 
2 factorial design. 



Table 2. ANOVA Design, 4* grade 

Regular time Extra time Extended oral 



Non-LEP students 


60 


60 


60 


LEP students 


40 


40 


40 



100 



100 



100 N = 300 



Table 3. ANOVA design, 7* grade 



Regular time Extra time Extended oral 



Non-LEP students 


25 


34 


40 


LEP students 


16 


21 


24 



41 55 64 N=160 



At each site, a site coordinator and/or teacher administered the tests and collected the data on 
students and received a stipend for his/her work. The site coordinator or teachers were trained by 
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the PI. Teachers administered the tests, filedl out a survey for each child and collected other data on 
students. Training procedures were developed and carried out in the author’s previous study in 
1998, and were fine tuned for this administration. 

In addition to the math test data, other student information were collected at the school site. To 
fully understand student performance, background and demographic information were collected via 
survey, along with other outcome variables to enable validation of the instrument. In 1998, an initial 
set of questions and data sources were piloted with teachers to try out items. Primarily, the 
variables include background variables (ethnicity, gender, primary language, self assessment of 
English proficiency, language classification status, years in school in the US, age, attendance record, 
SES) and educational variables such as grades, scores on norm referenced tests, primary language of 
instruction, and teacher ratings of student ability in math and reading, (see Appendix E for the list of 
variables) 

At the university, pre-edit checks were conducted on the surveys. After data entry, post edit 
checks including consistency and range checks were performed to ensure quality data. Eventually, 
survey data were merged with CTB test data and analyzed. 

D, Data Analysis Procedures 

To answer research question #3 (establishing validity for different groups of students), the validity 
of the constructs measured was ascertained by using correlations of test scores with other variables 
such as grades and teacher ratings. Research questions # 1 and 2 (on the effects of accommodations 
for different groups) will be answered by using a MANCOVA with math ability as covariate, 
testing for main effects for accommodation used and student subgroups. 

CTB/McGraw Hill analysts developed the half tests, and have evaluated the quality of the half 
tests, including domain coverage, and difficulty, as well as their parallel structure. In addition to 
quantitative analyses, data from student writing samples were analyzed qualitatively for patterns 
and trends. To run analyses of variance and covariance, some of the variables had to re-coded. The 
following re-coding was performed. Due to the small number of students, accommodation variable 
CONDTN was re-coded from 3 categories (no accommodation and two forms of accommodation) to 
two categories (no accom/accommodation). This dummy variable was used in the multiple regression 
models. However, for analyses of variance and covariance, all three categories of accommodations 
were used. Number of years lived in US (TIMEUS) was re-coded to change the code for don’t know 
from 0 to Missing, since 0 means no time in the US 

IV. FINDINGS 

The study’s findings are presented as follows. First, descriptive findings such as demographics, 
educational status, correlations and accommodation groups by group assignment are presented. 

Next, outcome data findings are presented. These include the ANOVAs and ANCOVAs on the 
effects of accommodations, regression analyses, and discriminant analyses. Descriptive findings are 
presented here for the 4'*’ and 7'*’ grade students. CTB/McGraw Hill has not yet given us the full 
test data for the 7'*’ graders, so the additional findings will be presented at a later date. 
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A,, Descrintive 

1- Demographics and educ ational status 

Tables 4a and 4b show the major demographics for the 4‘*' and 7*^ grade students 

About 60% of the 4 grade sample were Hispanic, 15% white, 20% African American, and 4% 

other Most (83 /o) were m free lunch and a majority were in Title 1 (70%). About half (741 were in 

“ program^ Only 16% or 22 were in a special education program and 41% were classified as 
LEP by LAS reading score. 



Demographics 


N,% 


Male 


120, 48% 


Female 


128, 52% 






White 


36, 15% 


African American 


48, 20% 


Hispanic 


149, 61% 


Asian/Pacitic Islander 


4, 2% 


American Indian/Native Alaskan/Other 


6, 2% 






Free lunch 


192, 83% 


Title I 


99, 70% 


LEP Program 


74,53% 


Special Education Program 


22, 16% ' 


LEP (LAS Score) 


82,41% 



, ^ ^ WILLI lUuUb on OLQcr iHDiCS 

because of missmg data on one or more variables. - 

were Hispanic, 3% white, 2% African American, and 5% other 
Moa {78 /o) were m fce lunch and almost all were in Title I (94%). Only 8 students were in an ' 
, Only 19% or 18 students were in a special education program and 28% were 

classified as LEP by LAS reading score, fewer than at 4*** grade. 



Demographics 


N,%. 


Male 


61, 50% 


Female 


61, 50% 






White 


3,3% 


Atncan American 


2, 2% 


Hispanic 


107,91% 


Asian/Pacific Islander 


6, 5% 


Free lunch 


93, 78% 


Title I 


88, 94% 


LEP Program 


8, 9% 


Special Education Program 


18, 19% 


LEP (LAS Score) 


35, 28% 



^ uvrwtJ livrv WILLI LOuUo OJ 

Because of missing data on one or more variables 
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Table 5 displays the means, standard deviations and numbers for demographic variables for the fourth 
graders. The scde for each viable is included in the table. The mean on English fluency is 

l?veSl„ L 5s •h' "“dents had 



Table 5: E ducational Status. 4“* pradpr^ 



Variables* 


N 


Mean 


St. Dev 


Degree of English fluencv 


248 


3.31 


.89 


Degree of Spanish fluencv CMi 


240 


2.50 


1.34 


No. of years lived in US r0-4i 


172 


3.76 


.57 


Reading grade ro-4) 


190 


2.59 


.98 


Math grade fO-4) 


192 


2.54 


1.03 


Reading skill Cl -SI 


220 


3.08 


1.06 


Math skill (1-5) 


220 


3.06 


1.02 


LAS Standard score (2-1001 


204 


79.40 


19.26 


Math test score (0-201 


213 


13.24 


3.83 


Yrs of Engl.instruction received (0-31 


197 


2.86 


.36 


Received instruction in Sn/other (1-21 


180 


1.77 


.42 


How long reading in English (0-41 


222 


3.69 


.72 


Scale score 1 fTenranova 


219 


608.84 


52.46 


Scale score 2 fTerranova') 


219 


615.56 


45.89 


Scale score total (Terranoval 


219 


1224.40 


85.68 


LEP status (1-31 (3=FEP1 


204 


1.60 


.49 



*See Appendix E Variable and Coding for scales. 

Note: Number of students does not agree with totals on other tables because of missins 
data on one or more vanables. ® 



means, standard deviations, and numbers for the T** grade students. As can be 
!f ^h’fi 1 ,®^^^ ^ graders on their means on most variables. The ?*■ graders scored 
Sdina ^ g^'aders on degree of Spanish fluency and slightly lower on math grades and 



Table 6: Ed ucational Sfotus. T" aradpr.; 



Variables* 


N 


Mean 


St. Dev 


Degree of English fluencv (1-41 


120 


3.38 


.83 


Degree of Spanish fluencv H-4) 


120 


2.79 


1.22 


No. of years lived in US (0-41 


119 


3.74 


.63 


Math grade (0-41 


104 


1,90 


1.37 


Reading skill (1-51 


112 


2.97 


1.06 


Math skill (1-51 


123 


3.05 


.97 


LAS Standard score f2-100'i 


123 


83.46 


14.29 


Math test score (0-201 


123 


13.03 


r^.99 


Yrs of Engl.instruction received (0-3) 


rTi9 


2.91 


.43 


Received instruction in Snanish (1-21 


119 


1.83 


.44 


Scale score 1 (Terrranova 




N/A 


N/A 


Scale score 2 (Terranoval 




N/A 


N/A 


Scale score total fTerranova'i 




N/A 


N/A 


LEP status (1-3) (3=FEP1 




N/A 


N/A 



♦See Appendix E Variable and Coding for scales. 
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NOTE: Number of students does not agree with totals on other tables because of 
Missing data on one or more variables. 

2. Correlations 



Table 7 shows the inter-correlations for all 4* graders among the Terranova math score LAS reading 
test score, reading and math grades, math computation test score, degree of English and Spanish ^ 
fluency, and years of English instruction. The tests show moderately high inter-correlations 
(between ,64-.69). Reading and math grades are highly correlated at .83. English fluency is highly 
negative y correlated with level of Spanish fluency (-.82), indicating that a higher degree of English 
fluency is related to a low degree of Spanish fluency. Years of English instruction is moderately 
positively correlated with reading grade, math grade, English fluency and the LAS reading score 
(range is between .23- .52). Years of English instruction is negatively related to level of Spanish 
fluency (r=-.45**), thus those children with more years of English instruction have a lower level of 
Spamsh fluency. Although there was a significant correlation between English fluency and 

Te^anova math score and math test, there was no relationship between years of English instruction 
and math test scores. 



Table 8 shows the correlations for 4“ grade LEP students only. These correlations ate somewhat 
lower than those for non-LEP students. For LEP students, there was a negative (but not significant) 
effect between English fluency and Terranova math and math test scores. The level of Spanish 
fluency was positively correlated with LAS Reading score, but this was not significant. As with the 
correlations for all students, years in English instruction were significantly correlated with reading 
grade (r=.32), English fluency (r=.45), and Spanish fluency (-.45). As contrasted with non-LEP 
students, LEP students did not show a significant correlation between English fluency level and LAS 
readmg score, math test score, Terranova test score, reading and math grade. However LAS reading 
score was sigmficantly related to the tests and grades, for LEP and non LEP students. The English 
fluency rating, done by teachers, may reflect only oral fluency, which may not help the LEP 
students do well on tests. 



Table 9 displays the correlations for non-LEP 4* grade students, 
to those from Table 7, for all 4* grade students. 



These correlations are very similar 



Table 7. Correlations, all students, 4“’ grade 





TNmath 


LAS-R 


Math test 


Readgrad 


Math grad 


EngRu 


SoanRu 


Y rsEng 


TN Math 


1.0 
















LAS-R 


.64»» 


1.0 














Math test 


. 69 ** 


.65** 


1.0 












Readjzrad 


. 59 ** 


.69** 


.42** 


1.0 










Math grad 


. 63 ** 


.68** 


1.50** 


.83** 


^.0 








English flu 


. 23 ** 


.25** 


00 


\. 25 ** 


.20** 


1.0 






Spanfluenc 


-.17* 


-.06 


-.06 


-.09 


-.07 


-.82** 


1.0 




Yrs Engli 


.05 


.23** 


-.03 


.33** 


.24** 


.52** 


-.45** 


1.0 



Tables. Correlations LEP students, 4^^ grade 




Table 9. Correlations Non- LEP students, 4'' grade 



TN Math 
LAS-R 
Math test 
Readgrad 
Math grad 
Engl fluen 


TNmath 

1.0 

.52** 

.59** 

.62* 

.61** 

. 14 


LAS-R 

1.0 

27 ** 

.57** 

T52** 

.30** 

-.15 

.17 


Mathtesi 

1.0 

.31** 

.47** 

.21* 

-.16 

-.05 


Readgrad 

1.0 

.76** 


Math grad 

To 


EngFlu 


Span Flu 


YrsEng 


Span fluen 
Yrs Engl 


T23* 

.16 


.21* 

-.06 

.24* 


.08 

.001 

..07 


1.0 

-.82** 

.55** 


1.0 

-.42** 


1 0 


p<.u:> »» p<.01 






1 .u 



Table 10 shows the correlations for most of the 7^^ grade students, 
similar to those for all of the 4* graders. 



Patterns of inter-correlations are 



Table 10. : Correlations, all students, 7““ grade 





TNmath 


LAS-R 


Mathtest 


Math grade 


EngFlu 


Span Flu 


Y rsEng 


TN Math 


1.0 














EAS-R 


.64** 


1.0 












Math test 


.62** 


.50** 


1.0 










Math grad 


.55** 


00 


.60** 


1.0 








English flu 


r35** 


.53** 


Cos 


Toi 


T.O 






Spanfluenc 


-.21* 


- 22* 


-.06 


.07 


_ .75* * 


1.0 




Yrs Engli 
* DC ns ** 


.19 

ni 


.41** 


,08 


-.10 


.45** 


- 22* 


1.0 



Accomipodation Group.s and Ran d om A.ssjgnment 



A cross tabulation was run to examine whether LEP students were randomly selected into different 
accommodation groups or classes. Although LEP students were more likely to be in the No 
accommodation condition, and non-LEPs were more likely to be in the Extra time + extended oral 
dnections condition, the differences were not significant. See Table 1 1 for details 
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Ta ble 11: Accommodation group by LAS status 





LEP (LAS = 1 or 2) 


Non-LEP(LAS = 
3) 


Total 


Extra Time 


36 


52 


88 

43.1% 




43.9% 


42.6% 


No Accommodation 


28 


35 


63 

30.9% 




34.1% 


28.7% 


Extra Time + Extended 


18 


35 


53 

25.9% 


Oral Directions 


22.0% 


28.7% 


Total 


82 


122 


204 




100.0% 


100.0% 





Note: Totals do not agree with those on other tables because of missing data 



B. Outcome Data 



E ffects Qf Accommodations: 4 *^ grade students 

Table 12 shows the means and number of students by accommodation condition As can be seen 
the me^s are highest for the extra time conditiom Table 13 shows the means on the two scale ’ 
scores by LEP status. Students scoring at the LAS level 3 (fully English proficient) scored 
sigmficantly higher than students scoring at level 1 on LAS (not English proficient) (F=40 38 



Accommodation condition 


Scale score 1 


Scale score 2 


No accommodation 


x= 598.11, n = 64 


X = 602.6 n = 64 


Extra time 


X = 619.24, n = 93 


x= 634.32, n=93 


Extra time + extra oral pres. 


x= 604.32, n=61 


x= 600.77, n-62 


TOTAL 


x = 608.85 n = 219 


x= 615.56 


Table 13. Means & number o 


' subjects bv LEP status 


LEP status (LAS level) 


Scale score 1 


Scale score 2 


LAS I (not Engl proficient) 


x= 550.5, n=17 


x = 605.76 n= 17 


LAS2 (LEP) 


X = 587.04, n = 36 


x= 616.1, n=35 


LAS3 (FEP) 


x= 637.02, n=90 


x= 630.56, n=90 


TOTAL 


x = 6l4.15 n= 143 


X = 623.97, n = 90 



Results of ANOVA and ANmv^ 

scores, SSI and SS2, were used as dependent variables in two separate 
ANCOVA and ANOVA models, with accommodation code CONDTN serving as the independent 
variable. In the ANCOVA model, the computational math test score MATHTEST was used as a 
covanate, to control for math knowledge. 



The results of ANCOVA analyses of Model 1 (SSI as the dependent variable) showed no 
significant difference of the means by accommodations (F=.29, df=2,15 1 p= 748) However the 
cov^ate was significant (F=173.3l, df=l,151,p=0.000). But, the results of an ANOVA model 
(with no covanate) indicated that the accommodation effect was significant (F=3.74, df=2,216 
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p=.033). (see Table 14) 



The results of ANCOVA Model 2 (SS2 as the dependent variable) showed a significant 
accommodation effect (F=6.96, dfi=l,151,p=.001). When the effect of math test score was removed 
from the model as the covariate (ANOVA model), the effect of accommodation increased (F=15.3 1 
df=2,216, p=0.000)(see Table 15). That is, the accommodation effect was significant even when a 
covariate was used. It must be indicated at this point that in the ANOVA and ANCOVA models 
we lost a relatively large number of subjects due to missing data, (138 fi-om the total of 292) 



Table 14. Analysis of variance, SSI as dependent variable 



Source of Variation 


SS 


df 


MS 


F Sig. F 


Accommodation effect 


18693.02 


2 


9346.51 


3.74 .033* 


Within (error) 


581224.1 


216 


2690.85 




Total 


599917.1 


218 


2751.91 





Table 15. Analysis of variance, SS2 as dependent variable 



Source of Variation 


SS 


df 


MS 


F 


Si.g. F 


Accommodation effect 


56997.04 


2 


28498.5 


15.31 


.000** 


Within (error) 


402047.6 


216 


1861.33 






Total 


459044.6 


218 


2105.71 







2. Regression Analvse.s 

In order to conduct the regression, the accommodation code was re-coded as a dummy variable 
(0=no accommodation, l=accommodation). This dummy variable was used as a predictor along with 
the following variables in the multiple regression: 

ENGLFLU (English fluency) 

LASLEVEL (LAS levels) 

READSKIL (reading skills) 

SPANFLU (Spanish fluency) 

TIMEUSl (Time lived in the us, recoded) 

For all fourth graders, two regression models were created. The first model used scale score 1 as the 
criterion variable and the accommodation dummy variable along with the other variables (listed 
above) were used as predictors. The second model used scalescore2 as the criterion variable with the 
same set of predictors. 



Model 1 yielded an R square of .584 (over 58% of the variance of the criterion variable was 
explained by the predictors). Among the predictors, the effect of accommodation (t=3. 19, p<001), 
LAS level (t=4.44, p<000) and reading skill (t=3.86, p<.00) were significant predictors in Model 1. 

Model 2 yielded an R square of .308 (about 30% of the variance was explained). In this model, the 
accommodation variable (t=2.72, p<.01) and reading skill (t=2.72, p<.01) were significant 
predictors. 
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Separate regressions were run for LEP and non-LEP student groups. The two regressions run for 
LEP students showed R squared values of .28 (SSI) and . 13 (SS2). Significant predictors for LEP 

children were reading skill (t=3. 1 , p<.00) in SSI and a marginally significant time in US for SS2 
(t=1.85, p<.07). 



For non-LEP children, the regressions showed R squared values of .33 for SSI and .33 for SS2. 
Significant predictors were receiving an accommodation (t=4. 1, p<.01) and reading skill (t=2.87, 
p<.01) for SSI and receiving an accommodation (t=3.25, p<.002) and reading skill (t=2.76, p< 008) 
for SS2. It is interesting to note that although LAS level (a proxy for LEP status) was significant 
predictor in the overall regression for all students, when the LEP and non-LEP student groups were 
run separately, results diverged. Only reading skill was a significant predictor for the LEP group. 

3- D . iggnmmant Analysis: Predi c tors of Group Membership 

A discriminant analysis was done, using the following variables: Spanish fluency (SPANFLU), time 
in US (TIMEUS), LAS reading score (LASSTAN), math test score (MATHTEST), reading grade 
(READGRAD), and whether the student received instruction in Spanish (INSSPAN) This was 
done to ascertain which variables best predicted student placement in one of three categories by 
their teachers: limited English proficient, fluent in English as second language, or English as first 
language. Two sigmficant fimctions were derived. The first one was dominated by the Spanish 

fluency variable. The second fimction was mainly made up of time in US, reading grade and math 
test score. 



In looking at group means (see Table 16), we see that Group 2 (LEP) scored high on fimction 1 
(Spanish fluency) and low on fimction two (time in US and reading ability). Group 3 (fluent in 
English as 2 language) scored highest on fimction 2 (time in US and reading ability) and relatively 
high on fimction 1 (Spanish fluency). Group 4 (English as 1*‘ language) scored the lowest on 

fimction 1 (Spanish fluency) and in between groups 2 and 3 on the second fimction (time in US and 
reading ability). 

Table 16. Group means at fimctions 





FUNCTION 1 


FUNCTION 2 


Group 2: Limited Engl, proficient 


2.29 


-.93 


Group 3: Fluent Engl. 2“*^ language 


1.69 


.95 


Group 4: English 1*‘ language 


r -2.69 


^-.08 



Interestingly, group 3 is almost as high as group 2 on Spanish fluency, but outscores group 4 
(English as first language) on fimction 2, which is made up of time in US, reading grade and math 
test score. This is a small but important group which should be examined (about 40 children) See 
Figure 1 following for the territorial map. 



Insert Figure 1 about here 



A classification analysis was also done in which subjects were grouped to see what percentage were 
correctly classified. While 87% of English as first language students and 83% of fluent in English as 
a 2 language students were correctly classified, only 44% of LEP students (group 2) were correctly 
classifi^sd. 44 students were ungrouped originally, and 43 of these were classified in the fluentT 
English as second language group. The total percentage of students correctly classified was 74% 



Figiure 1 
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Territorial Map inai cates a group centroid 
CAssuming all functions but the first two are zero) 
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■Other Analyses 



Writing samples that were embedded in the fourth grade exam were examined. The examination 
followed the principles of grounded theory (Strauss, 1987) focusing on data that pertain to the 

study of language and writing development. Several patterns emerged from this examination of the 
wnting samples. 

Examination of the writing samples clearly showed differences based on each student’s language 
proficiency level. Students who were less proficient in English demonstrated very limited 
understanding of the math question that was asked, in other words, their limited language 
proficiency interfered with their understanding of the required task. These students often gave literal 
responses and/or simple translated the numbers into words ( 5 = five). They also had difficulty with 
even simple math tasks that required understanding of the written text to complete. This indicates 
that these less English proficient students had not yet developed their cognitive academic language 

proficiency or CALP (Cummins, 1999) and did not yet have enough English to understand the text 
and therefore the math task. 



Along with that, many less proficient students demonstrated difficulty with both the math and the 
language. For example, it was common for the less proficient students to make errors on even simple 
questions both in terms of the math computations involved in addition to the aforementioned 
misreading of the required task. Furthermore, these students also showed a lack of familiarity with 

math words in English; some missed words as simple as subtraction (and performed a different math 
lunction). 

While some students who had higher levels of English proficiency still had difficulty writing math 
responses, they were better able to match the task to what was required. For example, one student 
wrote, I saw all the shapes, then in my mind I saw the shapes go on and that is how I got it” in 
response to a patterning question. Another student responded, “I counted the people” when asked 
how she estimated the number of people in the drawing. 

The students who were more English proficient on the whole wrote more elaborate responses and 
used math language throughout their responses, e.g., estimation, patterns, etc. These students 
demonstrated a better understanding of math concepts as well as an understanding of math 
terminology. They also were able to describe the strategies they used to answer questions. 

It is interesting to note that all students demonstrated accuracy in English spelling in their writing 
samples. Less English proficient students wrote shorter, more simplistic responses that did not 
always answer the questions posed, but were able to spell the words they chose conventionally. 
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V. Conclusions and Recommendations 

The purpose of this study was to examine the effects of the use of accommodations with LEP and 
non LEP children, and whether the use of accommodations affects the validity of test score 
interpretations. 

To answer our first research question, overall the use of accommodations did affect student test 
scores. In particular, students given the extra time accommodation showed higher mean scores 
Recession analyses showed that receiving an accommodation did not significantly predict math 
achievement for LEP students, but did predict achievement for non LEP children. 

The answer to the second question, do English proficient students benefit from accommodations'? is 
a strong yes. In addition, non-LEP students showed a greater effect for accommodations. 

Om third question, is not a straightforward one to answer. Which accommodations provide valid 

inferences for LEP students? LEP students show slightly lower correlations than non-LEP 

students. For LEP students, level of English fluency predicted only Spanish fluency ( - 88) and 

years of English instruction (.45**). For non-LEP children, English fluency level predicted LAS 

reading score, math test score, reading grade, as well as years of English instruction and Snanish 
fluency (negative). poiuoii 

LEP students were more likely than non-LEP students to have low test scores, grades and skills. 
Although LAS level (a proxy for LEP status in terms of reading) was a significant predictor in the 
overdl regression for all students, for the group of LEP students, only teacher rating of reading skill 
sigmficantly predicted Terranova math score. For non LEP students, reading skill and receiving an 
accommod^on predicted math achievement. The Terranova math test seems to measure English 
reading proficiency in addition to math knowledge and skills. Extra time may enable students to 
translate words needed to solve problems. This may be especially true for word problems. 

Shephard et al (1998) recommend that an accommodation should improve the performance of LEP 
students but should not improve the performance of English proficient students. In this study both 
groups benefited. Thus, if extra time is offered by a state or local district, it should be offered to all 
students, not just LEP students. 

Accommodations that allow students access to the test should be offered, as long as they do not 
influence the validity of the inferences made from them. Use of the extra time accommodation 
seems like a small price to pay to allow LEP students to show what they know and can do 

However, non-LEP students should probably also be offered the extra time accommodation, in 
fairness. 

Other findings include the fact that the LEP student group showed a great degree of heterogeneity 
as evidenced by their large standard errors in the analyses. In addition, results of the discriminant ’ 

^lysis showed that LEP students were twice as likely than English as a first language students to 
be misclassified in the analysis. 

More exploratory work needs to be done to examine the fluent in English as a second language group 
and to tease apart issues in the classification and identification of language fluency groups. 



It should be kept in mind that this study should be considered an exploratoiy analysis. Because of 
small sample sizes m the cells of the design, there may be some confounding going on. We have 
learned that it is not easy to make generalizations about LEP students. It appears to be necessary to 
isolate unique accommodations for different subgroups ,as well as for non-minority groups. 

In addition, it may be reasonable to move away from a research paradigm in which we make blanket 
generalizations about testing of all LEP students or about students in bilingual or ESL programs 
and move toward an individual model.. Shepard et al (1998) note that very few LEP students 
receive accomodations specific to their language needs. Many schools and districts accommodate 
all or none” of the LEP students or students with disabilities. Shepard et al (1998) suggest more 
training of school personnel so they can make better informed recommendations more targeted to the 
needs of individual English language learners. 
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